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BRIEF SUMMARY: 

1 BACKGROUND OF THE INVENTION 

2 The present invention relates generally to databases, and more particularly to 
matching new customer records to existing customer records in a large business 
database . 

3 A large business database often has duplications of the same customer records. 
The duplications are likely due to misspelling errors or because of multiple 
methods of entering the customer records into the database. These duplications 
result in several problems for the end-user. One problem is that a customer 
whose records have been duplicated may receive multiple mailings from the end- 
user. Another problem is that the end-user may not ever have consistent 
information about each customer. The customer information may be inconsistent 
because every time the customer record has to be updated, only one record is 
updated. There is no assurance that the most recently updated record will be 
revised, which results in inconsistent information. A third problem with 
duplicated records, is that the end-user is unable to determine how much 
business activity has been generated by a particular customer, retrieval 
systems. These library-style catalogue retrieval systems can search a large 
database of records to find matches that are similar to a query entered by an 
end-user. Typically, these library-style catalogue retrieval systems use 
phonetic -based algorithms to determine the closeness of names or addresses or 
word strings. A problem with these library-style catalogue retrieval systems 
is that they are only useful for searching through an existing customer 
database and are unable to compress a large customer database having multiple 
repetitions of customer records. Therefore, there is a need for a methodology 
that processes new customer records, checks the new records for poor quality, 
normalizes and validates the new records, and matches the new records to 
existing customer records in order to determine uniqueness. Normalizing, 
validating, and matching the customer records will allow an end-user to avoid 
wasted mailings, maintain consistent information about each customer, and 
determine how much business activity has been generated by a particular 
customer. 

4 SUMMARY OF THE INVENTION 

5 Therefore, it is a primary objective of the present invention to provide a 
method and system that normalizes and validates new customer records, and 
matches the new records to existing customer records in a large database. 

6 Another object of the present invention is to enable end-users of large 
business databases to avoid wasted mailings, maintain consistent information 
about each of their customers, and determine how much business activity has 
been generated by a particular customer. 

7 Thus, in accordance with the present invention, there is provided a method and 
a system for matching a new data set containing a record and a collection of 
fields to an existing data set in a database containing a plurality of records 
each having a collection of fields. In this embodiment, the new data set is 
initially read. Each of the fields from the record in the new data set are 
then validated. The validated fields in the record in the new data set are 
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then normalized into a standard form. Next, a hash key is selected for 
generating a candidate set of records from the existing data set in the 
database that likely matches the record from the new data set. The hash key is 
then applied to the plurality of records in the existing data set of the 
database to generate the candidate set of records . The record from the new 
data set is then matched to each of the records in the candidate set. The 
existing data set in the database is then updated according to the results of 
the match between the record from the new data set to the records in the 
candidate set. 

8 In accordance with another embodiment of the present invention, there is 
provided a method and system for generating rules for matching data in a 
database containing a plurality of records each having a collection of fields. 
In this embodiment, a sample of training data is obtained from the database. 
Similar pairs of records from the sample of training data are then identified. 
Field matching functions are applied to each of the corresponding fields in 
the similar pairs of records. Each field matching function generates a score 
indicating the strength of the match between items in the field. An 
intermediate file of vectors containing matching scores for all of the fields 
from each of the similar pair of records is then generated. The intermediate 
file of vectors are then converted into a plurality of matching rules for 
matching data in the database. The plurality of matching rules can then be 
used for matching a new data set containing a record and a collection of 
fields to an existing data set in a database containing a plurality of records 
each having a collection fields. 

9 While the present invention will hereinafter be described in connection with a 
preferred embodiment and method of use, it will be understood that it is not 
intended to limit the invention to this embodiment. Instead, it is intended to 
cover all alternatives, modifications and equivalents as may be included 
within the spirit and scope of the present invention as defined by the 
appended claims . 

DRAWING DESCRIPTION: 

BRIEF DESCRIPTION OF THE DRAWINGS 

FIG. 1 is a block diagram of a system for data validation and matching according to 
the present invention; 

FIG. 2 is a flow chart describing the data validation and matching according to the 
present invention; 

FIG. 3 is an example of a fixed general business file format that may be used in 
the present invention; 

FIG. 4 is flow chart describing the matching process in more detail; 

FIG. 5 is a screen view of an interface used for the matching process; and 

FIG. 6 discloses a flow chart describing the process of examining pending data for 
a match. 



DETAILED DESCRIPTION: 

1 DETAILED DESCRIPTION OF THE PRESENT INVENTION 

2 The present invention discloses a data validation and matching tool for 



http://westbrs:9000ftin/ga^ 



1/20/05 



Record Display Form 



Page 5 of 15 



processing raw business data from large business databases. The raw business 
data includes a plurality of records each having a collection of fields and 
attributes. A block diagram of a system 10 for performing data validation and 
matching according to the present invention is shown in FIG. 1. The system 10 
includes a database of existing customer records 12 and a database of new 
customer records 14. The database of existing customer records 12 can be a 
large database containing over 200,000 records. Each record has a collection 
of fields and attributes that are applicable for the particular business 
application. For example, some of the fields that may be used are business 
name, customer name, address, country, phone number, business codes, etc. The 
database of new customer records 14 can be as small as one record or as large 
as over 200,000 records. These records also have a collection of fields and 
attributes that are applicable to the business application. 

3 The system 10 also includes a computer such as a workstation or a personal 
computer. A representation of the functions performed by the computer are 
shown as blocks 16, 18, 20, and 22. In particular, a validation and 
normalization block 16 reads the data from the new records database 14 and 
checks the fields in each record for quality and normalizes the field 
information into a standard form. If the data is good, then a hash key 
selector 18 selects a hash key. Note that there may be one or more hash keys. 
A matcher 20 uses the hash key to select a set of candidates from all of the 
existing records in the database 12 with the same hash key. For example, the 
present invention will generate about 100 candidates for a 50,000 record 
database. The matcher 2 0 performs a matching operation between a new data 
record from database 14 and each member of the candidate set. The matching 
operation, which is described below in further detail, creates a list of 
potential matches. If multiple hash keys are used, then the process will 
retrieve records based on a disjunction of the hash keys. However, once all 
the matching is done, the matcher 20 makes a decision whether to create a new 
customer record in database 12, update an existing record in database 12, or 
save the new data in a pending file 22 for resolution at a later time. 

4 FIG. 2 is flow chart describing the operation of the data validation and 
matching according to the present invention. The operation begins by reading 
raw data from a record at 24 . The data from the record is validated for 
quality and standardized into a standard form at 26. Hash keys are selected at 
28 by the hash key selector 18. At 30, a set of candidates from all of the 
existing records in the database 12 with the same hash key are retrieved. The 
matching operation is then performed at 32 between the new data record and 
each member of the candidate set, resulting in a list of potential matches. 
Based on the matching results, block 34 creates either a new record in 
database 12, or updates an existing record in database 12, or places the new 
record in a pending file for resolution at a later time. If there are more 
records in the raw data file at 36, then the next record is read and the steps 
at 26, 28, 30, 32, and 34 are repeated. The operation ends once there are no 
more records to be processed. 

5 Before validation and normalization, the raw data file from the new records is 
read. In the present invention, the data can arrive from many different 
hardware/software systems ranging from an off-the-shelf spreadsheet 
application to a mainframe dump that is in a fixed "General Business File" 
format. An example of a fixed general business file format 3 8 is shown in FIG. 
3. The business file format includes a field number, field name, width of the 
field, and description of the field. Note that the business file format in 
FIG. 3 is only an example of possible fields and can change depending upon the 
particular business application. For instance, files of hospital patients may 
include patient name, date of birth, hospital ID, and patient sex. 



http://westbrs:9000/bi^ 1/20/05 



Record Display Form 



Page 1 of 41 



First Hit Fwd Refs 

Search Results 
Help 

User Searches 
P^ergflfl^ 9 of 9 
Logout 

US -PAT-NO: 5740421 
DOCUMENT - I DENT I F I ER : 



Previous Doc Next Doc Go to Doc# 



[J 



File: USPT 



Apr 14, 1998 



US 5740421 A 



TITLE: Associative search method for heterogeneous databases with an integration 
mechanism configured to combine schema -free data models such as a hyperbase 

DATE-ISSUED: April 14, 1998 



INVENTOR- INFORMATION : 

NAME CITY 

Palmon; Eran Tel Aviv 



STATE 



ZIP CODE 



COUNTRY 
IL 



ASSIGNEE-INFORMATION: 
NAME 

DTL Data Technologies Ltd. 



CITY STATE ZIP CODE COUNTRY TYPE CODE 

Tel -Aviv IL 03 



APPL-NO: 08/ 415601 [PALM] 
DATE FILED: April 3, 1995 

INT-CL: [06] A06 F 17/30 

US-CL-ISSUED: 395/604; 395/603, 395/611, 395/612, 395/613 
US -CL- CURRENT: 707 /4; 707/100, 707 / 101 , 707 / 102 , 707 /3 

FIELD-OF-SEARCH: 395/600, 395/611, 395/612, 395/613, 395/604, 395/603 
PRIOR-ART-DISCLOSED : 

U.S. PATENT DOCUMENTS 



n 

D 

n 



PAT -NO 
5416917 
5481703 
5515534 



IS SUE -DATE 
May 1995 
January 1996 
May 1996 



PATENTEE -NAME 
Adair et al . 
Kato 

Chuah et al . 



US-CL 
395/500 
395/600 
395/612 



OTHER PUBLICATIONS 



Chawathe et al . , "The TSIMMIS Project: Intgration of Heterogeneous Information 

Sources", in Proceedings of IPS J Conference, Tokyo, Japan, Oct. 1994. 

Uffe Knock Wiil, "Issues in the Design of EHTS : A Multiuser Hypertext System for 



http://westbrs:9000/bin/gate.exe? 1/20/05 



Record Display Form 



Page 2 of 41 



Colaboration" , IEEE on CD ROM, pp. 629-639, Jan. 1992. 
"HyperBase Developer Personal HyperBase" 

http : //wwwetb . nlm . nih . gov/ author/ hyp rbase . html , Jul . 1992 . 
S. Chawathe, H. Garcia-Molina, J. Hammer, K. Ireland, Y. Papakonstantinou, 
J.Ullman, and J. Widom: "The TSIMMIS Project: Integration of Heterogeneous 
Information Sources," in Proceedings of IPSJ Conference, Tokyo, Japan, Oct. 1994. 
Available via anonymous ftp from db.stanford.edu as /pub/chawathe/1994/tsimmis- 
overview. ps . 

Y. Papakonstantinou, H. Garcia-Molina and J. Widom: "Object Exchange Across 
Heterogeneous Information Sources," in Proceedings of IEEE International Conference 
on Data Engineering, 6-10 Mar. 1995., Taipei, Taiwan. Available via anonymous ftp 
from db. Stanford .edu as /pub/papakons t ant inou/ 19 94 /object -exchange -heterogeneous - 
is .ps . 

H. Garcia-Molina, J. Hammer, K. Ireland, Y. Papakonstantinou, J. Ullman, and J. 
Widom: "Integrating and Accessing Heterogeneous Information Sources in TSIMMIS," in 
Working Notes of AAAI Spring Symposium on db.stanford.edu 
as /pub/ullman/l995/tsimmis-abstract-aaai .ps . 

M.W. Bright, A. R. Hurson,S. Pakzad, Automatic Resolution of Semantic Heterpgeneity 
in Multidatabases in ACM Transactions on Database Systems, vol. 19 No. 2, Jun. 1994 
pp. 212-253. 

M.Gyssens, J. Paradaens , D. Van Gucht, A Grammar-Based Approach Towards Unifying 
Hierarchical Data Models in SIGM0D Conference proceedings, 1989, pp. 263-272. 
A. Poulovassilis, M . Levene,A Nested-Graph Model for the Representation and 
Manipulation of Complex Objects, in ACM Transaction on Information Systems, vol. 
12., No. 1 Jan. 1994, pp. 35-68. 

ART-UNIT: 237 

PRIMARY-EXAMINER: Black; Thomas G. 
ASSISTANT-EXAMINER: Robinson; Greta L. 
ATTY-AGENT-FIRM: Rosenfeld; Dov 



ABSTRACT : 

A method of performing an associative search on a set of heterogeneous databases is 
described, the method implemented on a general purpose computer. The method 
comprises converting each database of the set of databases into a schema -free 
structure called a hyperbase . The hyperbases corresponding to each database of the 
set of databases are combined into a single combined hyperbase, and that single 
hyperbase is normalized into a single normalized hyperbase. An associative search 
on the single hyperbase includes providing a set of input words. The method of the 
present invention determines an answer, which is that sub-hyperbase of the 
hyperbase to be searched which has minimum "cost" according to a criterion. Once an 
answer is determined, the answer is displayed to the user. 

88 Claims, 22 Drawing figures 
Exemplary Claim Number: 1 
Number of Drawing Sheets: 11 



BRIEF SUMMARY: 

1 MICROFICHE APPENDIX 

2 A 68 -page microfiche appendix consisting of one sheet and 57 frames microfiche 
is submitted as part of this application and incorporated herein. 
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3 The computer programs in the microfiche appendix are copyright DTL Data 
Technologies Ltd. 

4 COPYRIGHTED MATERIAL 

5 A portion of the disclosure of. this patent document contains material which is 
subject to copyright protection. The copyright owner has no objection to the 
facsimile reproduction by anyone of the patent document or the patent 
disclosure, as it appears m the Patent and Trademark Office patent file or 
records, but otherwise reserves all copyright rights whatsoever. 

6 BACKGROUND OF THE INVENTION 

7 A. Field of the Invention 

8 The field of the present invention is searching heterogeneous database systems 
and other systems for maintaining information in computers. 

9 B. Some Definitions 

10 The present invention deals with maintaining information, where information is 
. anything that can be displayed to and that is comprehensible to humans and 

maintaining information is carrying out on a computer the necessary processes 
for storing, retrieving, and manipulating information. In particular, the 
present invention deals with retrieving information. 

11 The present invention is applicable to all information, including textual and 
pictorial information. Textual information is data that can be spoken. Such 
data is stored in words, where a word is a list of characters. While 
embodiments of the present invention are described for textual information, 
extension to other types of information would be apparent to one skilled in 
the art. 

12 Textual data can be either natural language text, consisting of words that are 
arranged in sentences with natural grammar rules, or formatted data text (also 
called row data text) , consisting of words that are arranged in data 
structures such as tables, trees, sets of tables, lists of records, etc. 

13 The art of databases deals with all aspects of maintaining such formatted 
data, including storage on disks, concurrency control, etc. The method of 
present invention involves searching information, and deals with the 
structural aspects of databases: the methods of (logically) representing 
information using computers, defining a query on databases, and displaying 
information. 

14 A data-model is a generic data structure such as a table. For example, a 
relational data model is a set of tables, where a table is a set of rows, each 
with the same number of columns, all rows in a particular column sharing the 
same attribute. Each table typically is "a file", each row of a table is "a 
record of the file" each column in a row is a "field of a record" and each 
column having its particular attributes- - "the attributes of the field." 

15 A schema, defined with reference to a particular data model, is an instance of 
the particular data model, with parameters provided to the data model and 
words associated to the data model and its attributes. For example, a schema 
can be a table defined to have some fixed number of columns, a table name, and 
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the attributes of the columns. For example, FIG. 3 shows a relational model 
consisting of two tables, EMPLOYEES and SALARIES. The schema for this 
relational model is as follows. The first table 301 is called EMPLOYEES and 
has two columns, the first column 303 is called ID and the other column 305 is 
called NAME. That is, the first column has the attribute 307 ID and the second 
has attribute 309 NAME. Each row (or record) of the table consists of a number 
for an employee (in the ID column) and the employee's name (in the NAME 
column) . The second table 311 is called SALARIES and has three columns, having 
attributes ID, 1993 and 1994 denoted in FIG. 3 by 313, 315, and 317, 
respectively. Each record of SALARIES consists of a number for an employee (in 
the ID column), that employee's salary for 1993 (in the 1993 column) and that 
employee ! s salary for 1994 (in the 1994 column). 

16 A database instance is an instance of a schema with the schema's parameters 
set. For example, if the schema is a table defined to have some fixed number 
of columns, a table name, with each of the columns having some attributes, 
then the instance would have a particular number of rows (or records) and the 
table's elements would have words associated to them. For the above schema for 
the relational model consisting of the tables EMPLOYEES and SALARIES, FIG. 3 
shows the database with EMPLOYEES having two rows (or records) , the first 319 
having ID 001 and NAME John Smith and the second 321 having ID 002 and NAME 
Mary Lu, and SALARIES having one row 323 having ID 001, 1993 $73,000 and 1994 
$80,000. This database is referred to below as PERSONNEL. 

17 A query is a function, that is, a mapping, from a particular database into a 
substructure of the database. That substructure is called the answer of the 
query. The answer has particular parameters (words) that may be both from the 
schema of the database and from the data itself. Thus an answer can be viewed 
as an instance of a data structure. 

18 For example, the following defines a query on PERSONNEL: select ID and 1993 
from SALARIES where ID=001 

19 This would provide an answer consisting of a table with two columns having 
attributes ID and 1993, respectively. These are from the attributes of the 
schema of PERSONNEL. The answer table would have one row consisting of 001 in 
column ID and $73,000 in column 1993. The contents of the row are from the 
data of database PERSONNEL. 

20 In this specification, a note refers to a list of lines of words. Thus, an 
unstructured (except for lines) block of text is a note. 

21 A grid is a structure which can store grid-like items of information, for 
example, a spreadsheet. A grid can thus be thought of as a table where both 
the columns and rows have attributes (i.e., names). This is slightly different 
from a table in a relational database, where only columns have attributes or 
names, and each row is a record. 

22 A thesaurus is a set of phrases, where each phrase is a set of words. The 
phrases in a thesaurus entry typically each have similar meaning. In most 
cases, the phrases will consist of one word only, in which case a thesaurus is 
a list of words that have similar meaning. 

23 A stem is similar to a thesaurus in that it consists of a set of words. In 
this case, the words are all morphological variants of the same root. For 
example, a stem might contain the words salaries and salary. 

24 A grouped structure is a set of some of the above structures, with a name 
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given to the set of structures to indicate that the set 1 s component structures 
are all grouped together under that name. 

25 An associative search is a query that takes a set of words (a phrase) as input 
and determines output which consists of a set of answers, sorted by relevance, 
where each answer is as defined above; an answer can be viewed as a set of 
phrases from the database or databases being searched. 

26 The method of the present invention involves carrying out an associative 
search simultaneously on several heterogeneous databases. That is, on data 
which may be spread over several different database management systems. These 
systems may be incompatible with each other, use different data structures, 
and have their own different query languages. These systems might include 
relational data, notes, thesauri, stems, and/or grouped structures. 

27 The method of the present invention is not limited to the data structures 
defined above, and may by extended easily to include other types of 
structures. How to carry out such an extension would be clear to one in the 
art from the present specification. 

2 8 C. Description and Shortcomings of Prior Art Methods 

2 9 The present invention describes a method for performing associative searches 

of several heterogeneous databases. The method overcomes many of the 
shortcomings of the prior art methods for searching heterogeneous databases . 
Some of the shortcomings include: the need for programming and database know- 
how, the need for database structure and data formatting, the need for common 
representations to enable integrating data from different databases, the 
difficulty in performing a "fuzzy" search and the difficulty in using 
ambiguous data. 

3 0 Need for programming and database know-how 

31 Typically, to perform a successful query on a database, one needs to have 
programming skills. To perform a query, a conceptual description of the query, 
for example, the query request expressed in everyday language, needs to be 
translated into a list of instructions for the computer to carry out the 
request. The conversion is typically not unique, and requires specific 
knowledge about the database and some know-how. Typically, the person carrying 
out the request needs to be a computer programmer. The cost of any query is 
thus high. 

32 As an example, in the case of relational databases, one needs to know which 
files exist, what the files' names are, and what the fields and their 
corresponding attributes are for each file. In addition, one needs to be 
familiar with the concepts of selecting records from a table, joining two 
tables and projecting fields. Finally, one needs to be familiar with a query 
language such as SQL or some other relational query system. 

33 Previous attempts at overcoming the programming know-how barrier to using 
databases include systems that help a user navigate through the schema to 
create a query, say an SQL query. This navigation process takes time, and 
still requires the user to understand the concepts of relational databases and 
the operations one can perform on them. 

34 The present invention uses a data -model with no schema. Hence, a user does not 
need to know attributes (such as fields) to distinguish these from data 
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ABSTRACT: 

An information retrieving apparatus comprises a retrieve instruction executing 
means for executing a retrieve instruction based on a retrieval formula described 
based on an arbitrary schema, a schema conversion means for converting the 
retrieval formula into another retrieval formula according to another schema based 
on pregiven rules, and a schema management means for managing the rules for 
converting the retrieval formula into the other retrieval formula, wherein the 
retrieve instruction executing means retrieves desired information based on the 
other retrieval formula. In this case, preferred embodiments are as follows. 
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Exemplary Claim Number: 1 
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BRIEF SUMMARY: 

1 BACKGROUND OF THE INVENTION 

2 The present invention relates to an information retrieving apparatus such as 
commodity information which has been built into databases in various formats 
for description and, more particularly, to an information retrieving apparatus 
on various commodities provided by different providers on a communication 
basis, e.g., over the Internet. 

3 Recently, selling by correspondence utilizing virtual shopping malls or 
shopping pages provided over computer communication or the Internet has been 
holding the spotlight. 

4 However, consumers who purchase commodities through such shopping malls or 
shopping pages have problems such as inability to find commodities that they 
are looking for. Providers who provide commodities have a problem that their 
customers do not visit their shops (or the customers do not access their home 
pages) . The term "commodities" implies here not only material commodities but 
also immaterial commodities. For example, in the case of a commodity provider 
who is a broadcaster, the commodity is services such as programs that it 
broadcasts . 

5 Among the above -described problems, the inability to find commodities as a 
problem at the consumers* side refers to situations as described below. 

t 

6 It refers to a situation wherein one can not find a program that broadcasts a 
piece of music he or she wishes to listen to from among programs for broadcast 
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or a situation wherein one can not find a movie film that he or she wishes to 
watch to see a certain actor performing because a program table shows only 
general information. It further refers to such a situation that one can not 
find a home page that sells a certain commodity that he or she looks for over 
the Internet . 

7 From the providers ' point of view, referring to current retrieval services in 
the Internet as an example, there is a problem in that a new WWW (World Wide 
Web) site opened on the Internet can not be found by consumers because the 
services employ a system in which commodities are accessed from the consumers' 
side . 

8 Conventional retrieval services in the Internet will now be specifically 
examined . 

9 When a user searches information on the WWW, in general, the user retrieves 
information by passing keywords for retrieval services to, for example, a 
retrieval engine or the like. However, since such retrieval services handle an 
enormous number of WWW pages, too many results of retrieval can be provided or 
irrelevant pages can be returned unless the keywords are specified properly. 
This is significant especially in the case of retrieval of commodities for on- 
line shopping. 

10 For example, let us assume here that pages of on-line shopping on the WWW are 
searched in an attempt to purchase a red polo shirt from a certain 
manufacturer. Then, the user carries out retrieval by specifying "polo shirt", 
"manufacturer name" and "red" as keywords, but results of retrieval are 
returned including many irrelevant pages such as a page that introduces jeans 
from the relevant manufacturer and polo shirts from other manufactures and an 
essay on polo shirts from Ralph Lauren which is irrelevant to the intention of 
the user to purchase. 

11 On the contrary, retrieval using keywords provided by retrieval services 
searches only WWW pages including keywords that coincide with the input, and 
it is not necessarily possible to find pages which seem to be relevant. 
Specifically, as shown in FIG. 1, retrieval in search of a commodity named 
"Blade Runner" can return only data which conform to all of keywords 
"commodity name" and "Blade Runner" as a result of retrieval (only the data 
indicated by the solid line in FIG. 1) . Therefore, the WWW pages including 
information "Title: Blade Runner" or "Title in Japanese: Blade Runner" 
indicated by the dotted lines in FIG. 1 can not be obtained as a result of 
retrieval. Thus, it is not necessarily possible to retrieve desired 
information using keywords . 

12 As described above, since current retrieval of commodities is carried out on a 
full test basis, there are problems in that a result of retrieval can include 
many irrelevant things and in that a desired commodity can not be found. 

13 The above-described problems result from a fact that keywords to be retrieved 
are associated with pages instead of commodities and a fact that keywords are 
extracted from words that appear on pages and therefore the intentions of 
information providers that are not written on the pages (e.g., whether the 
pages are intended for selling or introduction) are not the object of 
retrieval. In order to solve such problems and to allow a user to utilize 
retrieval services intuitively, it is desirable to perform retrieval based on 
the features of commodities registered by information providers instead of 
keywords that are automatically extracted. 
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14 BRIEF SUMMARY OF THE INVENTION 

15 It is an object of the invention to provide an information retrieving 
apparatus which allows consumers to retrieve desired commodity information 
quickly and easily and consequently allows providers to present consumers with 
commodities that they can provide to consumers without special efforts. 

16 The present invention includes means as described below provided to solve the 
above-described problems. 

17 An information retrieving apparatus according to as aspect of the invention 
comprises a retrieve instruction executing means for executing a retrieve 
instruction based on a retrieval formula described based on an arbitrary 
schema; a schema conversion means for converting the retrieval formula into 
another retrieval formula according to another schema based on pregiven rules; 
and a schema management means for managing the rules for converting the 
retrieval formula into the other retrieval formula, wherein the retrieve 
instruction executing means retrieves desired information based on the other 
retrieval formula. In this case, preferred embodiments are as follows. 

18 (1) The apparatus further comprises a data base for storing retrieval object 
information, in which the retrieval object information consists of a pair of 
attribute data and a value. 

19 (2) The retrieve instruction executing means retrieves desired information 
based on the retrieval formula and the other retrieval formula. 

2 0 (3) The schema management means manages attribute information of at least one 
schema using a hierarchical structure. 

21 (4) when the retrieval by the retrieve instruction executing means does not 
provide the desired information, the schema conversion means converts the 
layer of the attribute information into the layer above it and wherein the 
retrieve instruction executing means executes retrieval based on the result of 
the conversion. 

22 (5) The retrieve instruction executing means retrieves information stored in 
at least one database connected through a network. 

23 An information retrieving apparatus according to another aspect of the 
invention comprises a meta-data storage section for storing meta-data stored 
in various formats; a schema declaration section for extracting a schema 
associated with meta-data for each of predetermined items of information from 
meta-data stored in the meta-data storing section to define a method of 
describing attributes; and a relation declaration section for defining a 
hierarchical relation between the attributes defined by the schema declaration 

. section and attributes of other schemata. With this configuration, the 

apparatus further comprises a core attribute hierarchy declaration section in 
which a general hierarchical relation between attributes is described. 

24 According to the present invention, retrieval is carried out after performing 
a conversion into a desired schema using schema hierarchy constituted by 
layers between which each attribute is related. As a result, unnecessary 
information will not be retrieved, and desired commodity information can be 
quickly and easily retrieved even if the required information is ambiguously 
specified. 
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25 Additional objects and advantages of the present invention will be set forth 
in the description which follows, and in part will be obvious from the 
description, or may be learned by practice of the present invention. The 
objects and advantages of the present invention may be realized and obtained 
by means of the instrumentalities and combinations particularly pointed out in 
the appended claims. 

DRAWING DESCRIPTION: 

BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS 

The accompanying drawings, which are incorporated in and constitute a part of the 
specification, illustrate presently preferred embodiments of the present invention 
and, together with the general description given above and the detailed description 
of the preferred embodiments given below, serve to explain the principles of the 
present invention in which: 

FIG. 1 illustrates an example of retrieval using keywords according to the prior 
art; 

FIG. 2 shows a schematic configuration of a retrieving apparatus according to an 
embodiment of the invention; 

FIGS. 3A and 3B illustrate operations of the retrieving apparatus according to the 
embodiment of the invention; 

FIG. 4 illustrates an example of data (meta-data) included in a schema management 
section 3; and 

FIG. 5 shows a schematic configuration of an apparatus for creating the meta data 
shown in FIG. 4. 



DETAILED DESCRIPTION: 

1 DETAILED DESCRIPTION OF THE INVENTION 

2 An embodiment of the present invention will now be described with reference to 
the accompanying drawings . 

3 FIG. 2 shows a schematic configuration of a retrieving apparatus according to 
an embodiment of the invention. 

4 The retrieving apparatus according to the invention comprises a retrieve 
instruction executing section 1, a schema conversion section 2 and a schema 
management section 3. Referring to FIG. 2, a database 4 stores information 
and, for example, it may be a WWW site on the Internet or a database in the 
intranet . 

5 The retrieve instruction executing section 1 retrieves desired data from the 
database 4 in response to a retrieve instruction. 

6 The schema conversion section 2 converts a retrieve instruction input to the 
retrieve instruction executing section 1 into a retrieve instruction which 
makes it possible to obtain the desired information. 

7 The schema management section 3 manages association of a predetermined schema 
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